Data Programming: Coursework - 2

Author: Sree Pradyumna Davuloori, MSc Data Science

1. Introduction

The recently concluded 2022 FIFA World Cup final was one of the most memorable finals ever witnessed. The tournament overall was very entertaining, with several surprising results and intriguing tactical trends. Some of the tactical trends are explained exceptionally by Mark Carey of The Athletic in [1].

From the referenced article by Mark Carey, one tactical trend that was particularly interesting was that teams who kept less possession of the ball compared to their opponent in a game scored a higher number of points per game. This shows that the teams who ceded possession of the ball to their opponent and focused on counter-attacking reaped rewards. Morocco and Japan were good examples of this.

The world cup also displayed a wide gamut of playing styles including the pass-to-death positional play approach of Spain that was ultimately unsuccessful, Morocco’s counter-attacking tactics and the flexibility of the two best teams Argentina and France.

However, international football is almost a different game when compared to club football. International teams and their managers have substantially less time on the training field to prepare and practice their tactics. Therefore, international teams are far less progressive compared to their club football counterparts and generally tend to focus on pragmatic football that gets the job done.

The focus of this paper will be on English football, specifically the English Premier League. The proposal hypothesized that English football has undergone an evolution in its playing style after influential coaches like Pep Guardiola and Jurgen Klopp joined English clubs around 2016. In the proposal, there was exploratory analysis done on 3 major questions:

i. Are goalkeepers increasingly relying on short passes from Goal Kicks?

The analysis of data showed that the median length of goal kicks has reduced every season starting from 2017 and ending in 2022. The mean and median length of passes by the goalkeepers from open play also decreased.

ii. Are Goalkeepers sweeping up long balls behind their defensive line?

The Average distance of defensive actions metric was utilized to analyse whether goalkeepers are stepping out and sweeping long balls. The analysis showed that there was an increase in teams’ usage of sweeping as a tactic.

iii. Are teams making more short passes?

This was analysed using the mean number of short passes made each season, from 2017 until 2022. The result showed fluctuations from season to season and needs further detailed analysis.

Therefore, through a birds-eye view of the data and some initial analysis, some evidence was gained to suggest that changes are occurring in the English game that can be further explored and explained through a detailed analysis.

2. Aims and Objectives

The original aim of the paper was to compare the five-year periods before and after the introduction of the two influential coaches Pep Guardiola and Jurgen Klopp, and contrast the state of English football before and after they set foot in England.

Unfortunately, Fbref which is the data source for this paper has detailed data starting only from the 2017-2018 season.

Therefore, this paper will aim to produce a comprehensive report on the evolution of English football over the past few years, with a focus on the seasons between 2017-18 and 2021-2022, the last fully completed season.

The report will focus on analysing the evolution of four main facets of football:
i. Goalkeeping
ii. Defence
iii. Passing and Possession of the ball
iv. Pressing

Tactically, these are the four most meticulously planned phases of the game by most modern football managers:

The evolution of these four phases will be analysed through a series of questions or hypotheses. Data visualisations will be utilised to present the analysis and the answer to the questions.

The evolution of the attacking phase of the game i.e scoring goals is not being analysed in detail. Scoring goals seems important, goals win games! But the attacking phase of the game is the most laissez-faire phase and the players have a high degree of freedom. This is the area that requires the most creativity and spontaneous innovation by the players and is therefore the phase that is least tinkered with, by the managers. Thierry Henry who played under Pep Guardiola’s coaching at Barcelona has talked about how Guardiola allowed players a lot of freedom after they got the ball to the final third of the pitch which is when the team is in the attacking phase. But till the ball reached the final third, Guardiola put clear rules in place. [2]

It is prudent to discuss the caveats to the analyses presented in this report:

3. Literature Review

There are some excellent articles online, that have talked about how football has changed over time and in the past decade.

An article by an unknown author in Soccerblade [4], details five factors due to which football has evolved over time, including technology, media, social media, tactics and style of play. The article talks about how sports science has entered football and has transformed the fitness levels of modern athletes through recovery methods like physical therapy, ice baths, cryotherapy and through data-driven analysis of their workloads for injury prevention. The article also gives a brief history of tactical changes in the game. The 2-2-6 formation was used in the early 1900s, which is in a way making a comeback as the in-possession system used by modern managers to create space in attack. The Total Football system pioneered by famous Dutchmen Johan Cruyff and Rinus Michels which laid the foundations for Pep Guardiola’s style of play, the Catenaccio implemented by Helenio Herrera which is an ultra-defensive style of play, Pep Guardiola’s tiki-tika and its resounding success at Barcelona are covered briefly. The article mentions the influence of Pep Guardiola and Barcelona on the modern game.

Daniel Taylor’s article in The Athletic [5] details the influence of Pep Guardiola on lower-league teams in English football. The article by Taylor is the closest work to this report. But this report will take a substantially more data-driven approach to examine the broader evolution of the English game over the past few years, coinciding with Guardiola and Klopp’s arrival in English football. Whereas Taylor’s article focuses on examining Guardiola’s direct influence on managers and teams in the lower leagues of English football.

Taylor interviews managers like Ian Evatt of Bolton Wanderers and Ian Burchnall of Notts County (at the time) who express their desire to play in an attractive possession-based style of play focused on winning in style. The article notes other managers operating in the lower leagues like Ben Garner, Rob Edwards and Liam Manning, who are attempting to espouse the traditional long-ball direct approach played at their teams to employ more of Guardiola’s style. Daniel Taylor shows that, through data collected by Opta, the number of long balls has decreased in all the top 4 leagues of English football. The most fascinating part of the article is the coverage of how even non-league teams are adapting the possession-based style favoured by Guardiola. This includes teams like Dorking Wanderers and Gateshead, operating in the National Leagues, who are playing an expansive possession style of football with success and enthralling fans. Ian Evatt, and surprisingly Wayne Rooney, formerly of Manchester United, are the only ones in the article who directly credit Guardiola as their influence, but it is clear to see that Guardiola has inspired many managers in lower leagues and non-league football to adopt his principles.

Bill Connelly’s article in ESPN [6], published in 2020, details the changes in football over the past decade. There are two interesting findings, all gleaned through a statistical approach by Connelly, looking at data from Opta. One, football has seen an increasing focus on efficiency, with a rise in possession, a rise in pass completion rate, a rise in more patient attacks and a drop in the count of possession changing feet. The number of tackles and fouls decreased, but the number of dribbles increased. This points to an increase in the technical quality of the players, who are therefore better at dribbling. Two, there has been an increased focus on pressing, in the 2010s decade, with substantial increases in possessions won in the final third and ball recoveries, both of which measure high pressing.

The English Premier League themselves publish an article every year on their website, that details the trends seen that year [7]. The 2021-2022 season trends showed a substantial increase in the number of possessions won in the final third, a clear sign of effective pressing strategies combined with more teams playing out from the back and risking losing possession in their own defensive third. The season also showed an increase in the number of fast, direct attacks, which could be the result of teams adopting to their opponent’s patient possession by staying compact and springing a fast direct attack when they get the ball.

4. Data Description and Metrics Explained

As described in the proposal, this paper will utilize data provided by Fbref [8]. Fbref partners with Opta, which collects detailed statistics about football.

The paper will utilise this data from Fbref:

  1. Each team’s performance in the English Premier League for a whole season: The picture below shows passing performance statistics for all 20 teams across the 2022-2023 season: image.png

Picture courtesy: Fbref

  1. Opponent’s performance against each team in the English Premier League for a whole season: The picture below shows the performance of opponents against each team in the English Premier League, across the 2021-2022 season. image-2.png Picture courtesy: Fbref

For example: The first row shows how many tackles, blocks, interceptions, etc were recorded against Brentford by their opponents, across the entire 2021-2022 season.

  1. Player performance data across an entire season: The picture below shows defensive performance data for every player that played in the 2021-2022 season. image-3.png

Picture courtesy: Fbref

For example, sorted in descending order of tackles in the attacking third, the picture shows Bernardo Silva, Martin Odegaard and Marc Cucurella as the top three players with the most tackles in the attacking third.

4.1 Glossary on Metrics and Stats used in the report

This section will give a brief introduction to some of the important concepts of football and the metrics used in this report. Readers with thorough knowledge of football can skip this section of the report.

i. Four main phases in football: In a football game, there are four phases of play. Each team is at one of these phases of play, at any moment in time. They are described in the next four points.

ii. Defensive phase: This phase occurs when a team is without the ball because their opponent has it. Different teams employ different tactics in this phase when they are without the ball, some teams look to aggressively pressure the opponent to win the ball back quickly and launch their own attacks. Others look to station themselves compactly, intending to lure the opponent into leaving some space that they can exploit once the ball is won back, this is called a counter-attack.

iii. Offensive phase: This phase occurs when a team is in possession of the ball. Some managers like Louis Van Gaal have explained their further breakdown of this phase into: the construction phase when the build-up of the attack has just started around the team’s own defensive third, settled possession of the ball that occurs when the team is in control of the ball and has settled into their possession structure, chance creation phase which occurs when the team has entered the dangerous areas near the opposition’s goal and is looking to create a scoring chance, chance finishing phase which is when the team’s players look to finish a chance.

iv. Defensive transition phase: This phase occurs as soon as the team has lost the ball. This phase can be dangerous because the team has just lost the ball and is disorganised. The proposal explained that the fundamental principle in any football system is to spread across the pitch when the team has the ball, so that the opponent can be pulled apart and chances can be created through the exploitation of space. Therefore, as the team has spread all across the pitch, the defensive transition phase is dangerous because the ball has just been lost and the opposition can counter-attack to create a chance. This is the phase into which a lot of meticulous planning has been put in by modern managers, especially in the last decade. Teams employ one of two strategies in this phase, they either immediately put pressure on the opponent to win the ball back, which is called counter-pressing. Or they immediately retreat into their compact defensive shape so that the opponent cannot counter-attack effectively.

v. Offensive transition phase: This phase occurs as soon as the ball has been won back from the opponent. Teams usually launch a quick counter-attack to utilise the space left by the opponent and create a chance.

vi. Defensive, Middle and Final thirds of the pitch: The football pitch has three imaginary divisions that can separate what a team does in that area. The defensive third is the area of the pitch closest to the goal a team is protecting. The attacking or final third is the area of the pitch closest to the goal a team is attacking to try and score goals. The middle third is between the defensive and attacking thirds. Explained well through pictures here [9].

image.png

Picture courtesy: https://www.rookieroad.com

image-2.png

Picture courtesy: https://www.rookieroad.com

vii. Possession: The act of having the ball is called possession. There is a metric called possession that shows how much of the ball a team has over a whole game. This metric is calculated through different methods by different stats companies. But Opta, the stats provider for Fbref, calculates this as a ratio of the number of completed passes made by a team divided by the number of completed passes made in the entire game. Therefore, possession is not calculated as the amount of time spent on the ball, but is rather calculated through the number of passes made.

viii. Pressing: Pressing occurs when a team actively pressures the opponent’s players to win the ball back. Teams employ pressing instead of passively staying in their defensive shape waiting for their opponent to make a mistake. However, pressing in the modern game is meticulously designed by managers, therefore teams do not aimlessly run around to get the ball back, this wastes energy and achieves the opposite effect by leaving gaps that the opponent then exploits. Pressing is strategically employed in certain phases of the game and areas of the pitch. High-pressing, which is done near the opponent’s goal is one of the biggest changes in the modern game. High-pressing is profitable because the ball is won back near the goal being attacked (i.e the opponent’s goal), therefore the team is only a few correct passes away from creating a good chance to score. High-pressing should not be confused with counter-pressing. Counter-pressing is the act of pressuring the opponent as soon as the ball is lost so that the team can quickly regain the ball. Counter-pressing is not specific to any area of the pitch. Pressing is explained well here by Spielverlagerung [10]

ix. Sweeping: Sweeping ties into the concept of high-pressing. To press the opponent high up the pitch, teams usually push all their players high up the pitch, including the defenders, to get the best possible chance of overwhelming the opponent near their own goal and get the ball back in an advantageous area. This also happens in a very settled possession sequence, when the ball is under the team’s control and therefore the team’s players have all moved up the pitch in hopes of creating good chances. When either of the two situations happens, there is a huge amount of space left behind the team’s defence. A pass over the top of the defence or through the defence can eliminate most of the team’s players and let the opponent’s strikers or forwards through on goal, one versus one with the goalkeeper. Sweeping is the action of a goalkeeper stepping out of their goal to clear the ball when the defenders have been bypassed. Traditionally, goalkeepers were never comfortable doing this, and even today a lot of goalkeepers are uncomfortable stepping out to clear danger. However, players like Manuel Neuer have pioneered sweeping and the trend has been continued by keepers like Ederson. Sweeping is slowly becoming an essential part of modern progressive football tactics.

x. Average Length of goal kicks by the goalkeeper: This is a metric that details the length of goal kicks by a goalkeeper. In 2019, the laws were altered to allow the team’s outfield players to receive goal kicks inside their own penalty area, and this has led to fascinating use cases of goal kicks. This is explained brilliantly by Michael Cox of The Athletic here [11]. The average length of goal kicks metric is a great way to measure how a team and its manager approach building their possession sequences. A team that focuses on possession almost always makes their first few passes short, usually from the goalkeeper to the defenders. A team focused on direct attacks uses the goal kick to launch the ball forward to usually a tall forward, aiming to get near the opponent’s goal quickly.

xi. Average Length of passes by the Goalkeeper in open play: This metric details the length of passes made by the goalkeeper in open play. Open play does not include goal kicks.

xii. Passes Launched percentage and Goal kicks Launched percentage: In football, a pass being launched means that the pass has been kicked long. Therefore these two metrics measure the percentage of times the goalkeeper has kicked the ball long, from open play and goal kicks respectively.

xiii. Average Distance of Defensive actions: This metric details the distance from the goal where the goalkeeper commits defensive actions, which includes claiming the ball or sweeping it i.e clearing the ball to avert danger. The higher this distance is, the more proactive the goalkeeper is in coming out of their area to prevent dangerous situations.

xiv. Defensive actions outside the area: This metric counts the number of times a goalkeeper performs a defensive action outside the penalty area. This is a good proxy metric for measuring the amount of aggressive sweeping employed by a team through their goalkeeper.

xv. Tackles in the final third: This metric measures the number of tackles made by a team or a player in the final/attacking third. As discussed previously, this is the area of the pitch closest to a goal being attacked and is therefore a great place to win the ball back with the potential to create a scoring chance.

xvi. xG or Expected goals: Expected goals or the xG metric shows the likelihood that a given shot will end up being a goal. xG ranges from 0 meaning a zero percent chance of being a goal to 1 meaning a hundred percent chance of being a goal. It comes with a couple of important caveats: xG does not consider the shooting player’s finishing quality or technique or even the kind of shot they will take, xG also does not account for the goalkeeper’s ability to save the shot. What xG does account for is the situation the shot is taken in including the location and angle from where the shot will be taken, the body part of the player being used to take the shot, the type of pass that resulted in this shot and the type of possession that resulted in this shot. xG can be a flawed metric given its caveats. However, if used for what it was designed to indicate, it can be effective in understanding a team’s ability to generate high-quality chances over a substantial period of games like 10 games to a season, and it can quantify an individual player’s ability to convert chances into goals.

xvii. Non-penalty xG: Penalties have a high probability of being converted into a goal, being a one-versus-one duel with the goalkeeper taken just 18 yards away from the goal. Penalties have been assigned a constant xG value of about 0.76 (varies slightly between data providers). Therefore, excluding penalties from the xG calculation gives a better indication of a team’s ability to create good scoring chances and a player’s ability to convert said chances.

xviii. Shot Creating actions: A shot-creating action is anything done on a football field that immediately leads to a shot on goal. This can include a pass, a dribble, a set piece like a free kick or corner, a foul which leads to a shot from the resulting set piece and even a tackle to win the ball back which leads to a shot. Technically, shot-creating actions include the two previous actions directly before the shot. Therefore, two separate players in a team can get credit for the shot creating action for the same shot.

xix. Tackles: A tackle is a direct duel by the defensive team’s player with the opponent’s player on the ball. A tackle can be attempted without being successful, which means the defensive player tried to win the ball back but failed. Fbref data contains metrics measuring both tackles attempted and tackles completed.

xx. Progressive Passes: A progressive pass is any completed pass that travels more than 40 yards up the pitch. Passes made from inside the team’s defensive third are excluded. The intention of using this metric is to exclude clearances made from near the team’s own goal and to only record passes that intentionally move the team up the pitch and towards the attacking third.

xxi. Passes entering the final third: A final third entering pass is any completed pass that reaches the final third of the pitch.

xxii. Through Balls: A through ball is any pass that is that sends a teammate into open space towards the goal being attacked. While not always the case, through balls have a high chance of leading to a dangerous scoring chance. The downside is that they are extremely difficult to make, needing eye of the needle passes.

5. Data gathering and setup

In this section, steps will be taken to prepare the data for analysis. The process will include:

Importing the necessary python libraries: